A virtual machine approach to fuzzing interfaces

In redpanda the core data structure used to move around nearly all bytes that flow through the system is called iobuf. Because it is used in so many contexts the iobuf interface has grown over time and become more complex in order to accommodate a variety of use cases. Needless to say, the correctness of this utility is critical. But how do we go about testing it? How can we be sure that we are exercising all of its edge cases?

Enter automated fuzzing. We’ve been writing fuzz tools at Redpanda Data since the early days, and they’ve generally been based on a harness that executes a random schedule like the following snippet:

struct op {};
struct op_a : op {};
struct op_b : op {};

std::vector<op*> ops;
generate_random_schedule(ops);
for (auto* op : ops) {
    op->execute();
    verify();
}

Here each op is an action that modifies the system state. They are run in a random order, and after each action completes, the state of the system is verified. What these operations are and what constitutes the verification step is completely dependent on the target of the testing. For example it could be the iobuf interface, or it could be an entire subsystem.

Thinking back, this was a very effective technique that helped us quickly find and address rare issues that may have otherwise been found by users. We were always aware that this random generation wasn’t particularly sophisticated, and that it had a tendency to generate uninteresting schedules. But run long enough, these tools would eventually reveal an edge case or behavior we had not considered.

I had been vaguely aware that fuzzing was a large area of study in computer science with well-known folks talking about it in the context of their research. But it was only recently that I tried to use fuzzing tools myself, and discovered just how rich the area is with prior work. See https://github.com/wcventure/FuzzingPaper for a taste. It turns out the topic of fuzzing nerd sniped me for a solid week. But I walked away from that diversion with a long list of fuzzing related topics that I want to deep dive into, as well as what I think is a solid approach to guided fuzzing for iobuf. That’s what we are going to look at in the remainder of this post.

The fuzzing target #

The target of our fuzzing is going to be a data structure called iobuf. Effectively an iobuf can be thought of as a contiguous byte sequence, but optimized to support operations that would be inefficient if the bytes were stored physically contiguous (e.g. random insertion). Under the hood an iobuf stores its data in variable-sized segments as a means to deal with memory fragmentation and efficiently support various types of operations by reducing the amount of data movement. Here is an abbreviated version of the iobuf interface:

class iobuf {
public:
    void reserve_capacity(size_t);

    void append(const char *, size_t);
    void prepend(const char *, size_t);

    // zero-copy an extent
    iobuf share(size_t, size_t) const;

    const_iterator cbegin() const;
    const_iterator cend() const;

    ...
};

The iobuf type is used just about everywhere data is passed around: arriving off of a network interface, being read from disk, or sending data to another core. It also contains a lightweight mechanism for performing zero-copy sharing using reference counting, which adds a dimension of complexity to the implementation.

Given how important iobuf is and its non-trivial implementation, we’d like to explore ways to increase our confidence in its correctness by subjecting it to a variety of synthetic workloads. Additionally, we’d like to avoid a naive exploration of the state space, and instead generate interesting schedules that are more efficient at poking a fuzz target. Guided fuzzing using a tool like Clang’s libfuzzer is one way to do this.

Quick intro to libfuzzer #

Guided fuzzing is useful because, in contrast to naively generating a random schedule of operations, guided fuzzers can generate a schedule after observing the behavior of the program under previous schedules. This allows guided fuzzers to target goals such as increasing code coverage, and do so more efficiently.

In this post we’ll consider Clang’s fuzzing framework called libfuzzer. The main interface that libfuzzer provides is the following function. It serves as an upcall and is invoked by libfuzzer providing an opaque blob of bytes that represents a single instance of a test. Our task as users of libfuzzer is to turn these bytes into an experiment which tests the fuzzed target. How the input is transformed into an experiment is completely dependent on what is being tested, be it a compression algorithm, data format, or in our case, iobuf.

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
    // do something interesting with {data, size}
}

I’m still quite new to guided fuzzers, but I have the impression that the task of deciding how to wrangle the fuzzer’s input data into an interesting experiment is a creative process that represents the majority of the work required to fuzz a target.

Covering fuzzing in general, or even libfuzzer specifically, is beyond the scope of this post. But there are a lot of great resources out there. When I was just getting started, these were (and still are) really useful:

The virtual machine approach #

When I was first learning about using guided fuzzers I had no clue at all how I should turn the opaque data that the fuzzer provided into something useful. In retrospect I was probably close to abandoning the idea and filing it away as not being useful for cases like iobuf where the goal is testing an API surface area.

Luckily a colleague of mine, https://github.com/felixguendling, had been down this path before and offered the advice of using an approach that modeled the problem as a virtual machine. In this approach the input data from libfuzzer is interpreted as a program expressed as bytecode.

Effectively we let libfuzzer designs program that we execute!

Operations #

Okay, so a virtual machine interpreter. Let’s start with the op codes. I created one op code for each iobuf operation of interest. Here is an abbreviated list:

enum class op_type : uint8_t {
    moves,
    trim_front,
    hexdump,
    reserve_memory,
    prepend_temporary_buffer,
    append_iobuf,
    append_char_array,
    max = append_char_array,
};

Some operations like hexdump do not require any operands, while others, like append_char_array depend on additional operands like a source of bytes. To keep things simple the following op_spec type is used to represent an instance of any operation and its operands. We wrap all data in a std::string_view so that we have access to a convenient API for manipulating byte strings, and to reduce memory copies for performance improvements.

struct op_spec {
	op_type op;
	size_t size;
	std::string_view data;

	explicit op_spec(op_type op)
	  : op(op) {}
};

Now we need a way to turn libfuzzer’s input data into a sequence of operations. Let’s walk through the key pieces from the bottom up. Consider the code snippet below. First, the input data from libfuzzer is used to construct an instance of driver where it is stored as the string view program_. And really going all in with the virtual machine analogy, the program counter pc_ points to the beginning of the input data.

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
    std::string_view program(reinterpret_cast<const char*>(data), size);
    driver vm(program);

    ...
}

class driver {
    const std::string_view program_;
    std::string_view::iterator pc_;

public:
    explicit driver(std::string_view program)
      : program_(program)
      , pc_(program.cbegin()) {}

    ...
};

At a high-level the driver reads operations from the input program until the end of the program is reached. There are two methods used to drive this decoding process. First, read<T>, shown in the snippet below, is used to decode integers from the input program. It copies the bytes pointed to by the program counter into an instance of T, advances the program counter, and returns the value. When the end of the program is reached, an end_of_program exception is thrown. This exception, as we’ll see later, is a signal to cleanly stop execution because in general the input program will not be perfectly aligned. Without this trick, nearly all inputs would represent invalid programs.

template<typename T>
T read() {
    if (std::distance(pc_, program_.cend()) < sizeof(T)) {
        throw end_of_program();
    }
    T ret;
    std::memcpy(&ret, pc_, sizeof(T));
    std::advance(pc_, sizeof(T));
    return ret;
}

The second method used to drive execution is next(). First, read<T> is used to decode the next integer from the program and interpret it as an op code. The integer is forced into the range of valid op codes, otherwise the fuzzing would be very inefficient since a huge proportion of the possible integer values are invalid op codes in this particular interpreter.

op_spec next() {
    // next byte is the op code
    using underlying = std::underlying_type_t<op_type>;
    const auto max_op = static_cast<underlying>(op_type::max);

    // the next op
    auto op = op_spec{static_cast<op_type>(
        read<underlying>() % (max_op + 1)))};

And finally, the operands. Those are also going to be pulled out of the input program. I’ve abbreviated the cases for illustration, which are grouped into two categories: operations that need only a size operand, and those that need a size and data. Any data for operands is taken directly from the input to avoid large memory allocations–the program size itself bounds memory consumption.

    // needs size operand
    switch (op.op) {
    case op_type::trim_front:
    case op_type::append_char_array: {
        op.size = read<uint32_t>();
        break;
    }
    default:
        break;
    }

    // needs data operand
    switch (op.op) {
    case op_type::append_fragments:
    case op_type::append_char_array:
        op.size = std::min<size_t>(
            op.size, std::distance(pc_, program_.end()));
        op.data = {pc_, op.size};
        std::advance(pc_, op.data.size());
        break;

    default:
        break;
    }

    return op;
}

...
};

To recap, we now have an input program, and a method for converting it into a sequence of operations. Now we need to apply the operations. Here is an updated version of the libfuzzer upcall method that we saw earlier. As shown below, the driver::operator() is invoked until a stopping condition is reached–either a clean shutdown or an exception.

extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
    // NOLINTNEXTLINE
    std::string_view d(reinterpret_cast<const char*>(data), size);
    driver vm(d);

    try {
        while (vm()) {
            vm.check();
        }
    } catch (...) {
        vm.print_trace();
        throw;
    }

    return 0;
}

The driver::operator() method is quite simple: it converts the end_of_program exception into a false return value that signals a clean shutdown, or it takes the successfully decoded operation and passes it to handle_op and then signals to the caller that there is more input to process.

bool operator()() {
    const auto op = [this]() -> std::optional<op_spec> {
        try {
            return next();
        } catch (const end_of_program&) {
            return std::nullopt;
        }
    }();

    if (op.has_value()) {
        handle_op(op.value());
        return true;
    }

    return false;
}

Next we’ll take a look at handle_op and see how an operation is run and what it means for an operation to succeed or fail.

Testing fuzz target properties #

So far we have seen some basic infrastructure for setting up libfuzzer to generate schedules of operations to test an interface. But what exactly are we going to test? This is a core challenge when fuzzing a target.

In our case with iobuf we’d like to do two things. First, we want to test that operations on iobuf behave as expected. That is, if some data is appended for example, no matter what that data is or what state the iobuf is in, the iobuf instance should reflect that the data was correctly appended.

Second, we want the fuzzer to achieve full code coverage of the iobuf implementation, and find interesting executions and edge cases. We want this coverage because it is useful to know that we are at the very least testing code that we believed was useful for some situation. But another reason for this is that we are going to turn on all of the sanitizers which are exceptionally good at finding things like undefined behavior. But we need coverage for the sanitizers to be effective.

Let’s return to driver::handle_ops. All this method does is forward the decoded operations to an instance of the type iobuf_ops, where all the important bits live.

class driver {
    iobuf_ops m_;

    void handle_op(op_spec op) {
        switch (op.op) {
        case op_type::copy:
            m_.copy();
            return;

        case op_type::moves:
            m_.moves();
            return;

        ...
    }
};

The iobuf_ops type is conceptually very simple. It contains two members: an iobuf instance, and a reference implementation of the iobuf interface. The sequence of operations decoded from the libfuzzer input by the driver we designed above will target both containers. After an operation is applied the equivalence of the two containers will be checked. If they are different, then there is a mistake somewhere!

class iobuf_ops {
    iobuf buf;
    std::deque<std::string_view> ref;

The reference implementation is intentionally designed to be as simple as possible in order make visual inspection of its behavior clear as well as avoiding any correlated behavior between it and the iobuf implementation.

Below are two example operations: appending data from an input, and trimming data from the front of the containers. When appending data the iobuf may coalesce data into existing fragments, or make allocation decisions based on heuristics. However, the reference implementation merely appends the input data to the std::deque container.

    void append_char_array(std::string_view payload) {
        buf.append(payload.data(), payload.size());
        ref.push_back(payload);
    }

    void trim_front(size_t size) {
        buf.trim_front(size);
        while (size && !ref.empty()) {
            if (size >= ref.front().size()) {
                size -= ref.front().size();
                ref.pop_front();
            } else {
                ref.front().remove_prefix(size);
                return;
            }
        }
    }

After each operation is completed the check method is called. As we can see below it performs some sanity checks on various properties. At the end, check_equals() is called to verify that the two containers are byte-for-byte equivalent. I’ll omit that here because it isn’t related to fuzzing, and is a little complicated.

    void check() const {
        const auto ref_size = std::accumulate(
          ref.begin(),
          ref.end(),
          size_t(0),
          [](size_t acc, std::string_view sv) { return acc + sv.size(); });

        if (buf.empty() != (ref_size == 0)) {
            throw std::runtime_error(fmt::format(
              "Iobuf empty state {} != reference empty state {}",
              buf.empty(),
              ref_size == 0));
        }

        if (buf.size_bytes() != ref_size) {
            throw std::runtime_error(fmt::format(
              "Iobuf size {} != reference size {}",
              buf.size_bytes(),
              ref_size));
        }

        check_equals();
    }

So how well did we do?

Results #

We did exceptionally well! We managed to get something like 99.9% code coverage, and even found some dead code. There are a few spots that coverage didn’t reach and we aren’t yet sure if they are just hard to reach, or if it’s also dead code. That is an on going investigation.

In addition to the coverage, we found 3 real bugs. One example is that schedules containing memory reservations were more likely to produce incorrect behavior. This was because some functions did not take into account that an internal fragment may exist to provide capacity, but be logically empty. We don’t have any reason to believe these conditions were ever encountered in practice. But they could have been: making a memory reservation is a common optimization in our code base.

The one major area of the iobuf implementation that was not fuzzed is the futurized interfaces that must be run in the context of a seastar reactor thread. I’ll be posting a new article on how we handle that scenario.